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Abstract 

One of the most basic problems in compressed sensing is solving an under-determined system of linear 
equations. Although this problem seems rather hard certain £\ -optimization algorithm appears to be very 
successful in solving it. The recent work of [14,28] rigorously proved (in a large dimensional and statistical 
context) that if the number of equations (measurements in the compressed sensing terminology) in the sys- 
tem is proportional to the length of the unknown vector then there is a sparsity (number of non-zero elements 
of the unknown vector) also proportional to the length of the unknown vector such that £\ -optimization algo- 
rithm succeeds in solving the system. In more recent papers [78,81] we considered the setup of the so-called 
block-sparse unknown vectors. In a large dimensional and statistical context, we determined sharp lower 
bounds on the values of allowable sparsity for any given number (proportional to the length of the unknown 
vector) of equations such that an £2/^1 -optimization algorithm succeeds in solving the system. The results 
established in [78,81] assumed a fairly large block-length of the block-sparse vectors. In this paper we con- 
sider the block-length to be a parameter of the system. Consequently, we then establish sharp lower bounds 
on the values of the allowable block-sparsity as functions of the block-length. 

Index Terms: Compressed sensing; Block-sparse; £2/^1 -optimization . 



1 Introduction 

In last several years the area of compressed sensing has been the subject of extensive research. Finding the 
sparsest solution of an under-determined system of linear equations turns out to be one of the focal points of 
the entire area. Recent phenomenal results of [14] and [28] rigorously proved for the first time that in certain 
scenarios one can solve an under-determined system of linear equations by solving a linear program in 
polynomial time. These breakthrough results then as expected generated enormous amount of research with 
possible applications ranging from high-dimensional geometry, image reconstruction, single-pixel camera 
design, decoding of linear codes, channel estimation in wireless communications, to machine learning, 
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data-streaming algorithms, DNA micro-arrays, magneto-encephalography etc. (more on the compressed 
sensing problems, their importance, and wide spectrum of different applications can be found in excellent 
references [4, 12, 15, 24, 37, 58, 60, 66, 68, 70, 7 1, 91 , 93]). 

The interest of the present paper are the mathematical aspects of certain compressed sensing problems. 
More precisely, we will be interested in finding the sparsest solution of an under-determined system of linear 
equations which, as mentioned above, is one of the most fundamental problems in the compressed sensing. 
While the setup of this problem is fairly easy its solution is rather hard. Namely, the setup of the problem is 
as simple as the following: we would like to find x such that 

Ax = y (1) 

where A is an M x N (M < N) measurement matrix and y is an M x 1 measurement vector. In usual 
compressed sensing context x is an N x 1 unknown if -sparse vector (see Figured]). This assumes that x 
has at most K nonzero components (we assume ideally sparse signals; more on the so-called approximately 
sparse signals can be found in e.g. [21,79,84,95]). In the rest of the paper we will also assume the so-called 
linear regime, i.e. we will assume that K = j3N and that the number of the measurements is M = aN 
where a and (3 are absolute constants independent of N (more on the non-linear regime, i.e. on the regime 
when M is larger than linearly proportional to K can be found in e.g. [22,45,46]). Since the problem given 
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Figure 1: Model of a linear system; vector x is if-sparse 



in (CD has been known for a long time there is an extensive literature related to possible ways for solving 
it. If one has freedom to design the measurement matrix A then, clearly, a particular recovery algorithm 
for that design can be developed as well. As shown in [3, 59, 65], the techniques from coding theory (based 
on the coding/decoding of Reed-Solomon codes) can be employed to determine any if-sparse x in CD) for 
any a and any f3 < ^ in polynomial time. It is easy to see that j3 can not be greater than ^ for x to be 
uniquely recoverable. Therefore in terms of recoverable sparsity in polynomial time results from [3, 59, 65] 
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are optimal. The complexity of algorithms from [3,59,65] is roughly 0(N 3 ). If A is designed based on the 
techniques related to the coding/decoding of Expander codes then the complexity of recovering x in (O is 
O(N) (see e.g. [52,53,94] and references therein). However, these algorithms do not allow for (3 to be as 
large as ^. 

On the other hand, if there is no freedom in the choice of the matrix A the problem becomes NP- 
hard. Two algorithms that traditionally perform well and have been the subject of an extensive research in 
recent years are 1) Orthogonal matching pursuit - OMP and 2) Basis matching pursuit - l\-optimiz.ation. 
Both of the algorithms have advantages and disadvantages when applied to different problem scenarios. As 
expected a very extensive literature has been developed (especially in last several years) that covers various 
modifications of both algorithms so to emphasize their strengths and neutralize their flaws. However, a short 
assessment of their differences would be that OMP is faster while BMP can recover higher sparsity and 
is more resistant to system imperfections. Under certain probabilistic assumptions on the elements of the 
matrix A it can be shown (see e.g. [62, 63, 86, 88]) that if a = 0(/3 log(i)) OMP (or a slightly modified 
OMP) can recover x in (O with complexity of recovery 0(N 2 ). On the other hand a stage- wise OMP 
from [36] recovers x in (Q~|) with complexity of recovery 0(N log N). 

Since the results of this paper will in some sense be related to l\ -optimization (considered in [14, 15,28, 
34]), below we briefly recall on its definition. Basic l\ -optimization algorithm (more on adaptive versions 
of basic l\ -optimization can be found in e.g. [16, 19,76]) finds x in (0Q) by solving the following problem 

min ||x||i 

subject to Ax = y. (2) 

(Instead of l\ -optimization one can employ ^-optimization, < q < 1, which essentially means that instead 
of norm 1 one can use norm q in (Q~|). However the resulting problem becomes non-convex. A good overview 
of that approach can be found in e.g. [26,43,48-50,75] and references therein.) Quite remarkably, in [15] 
the authors were able to show that if a and N are given, the matrix A is given and satisfies a special property 
called the restricted isometry property (RIP), then any unknown vector x with no more than K = j3N 
(where j3 is an absolute constant dependent on a and explicitly calculated in [15]) non-zero elements can 
be recovered by solving (f2]). As expected, this assumes that y was in fact generated by that x and given to 
us. The case when the available measurements are noisy versions of y is also of interest [14, 15,51,92]. We 
mention in passing that the recent popularity of l\ -optimization in compressed sensing is significantly due 
to its robustness with respect to noisy measurements. (Of course, the main reason for its popularity is its 
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ability to solve (fl} for a very wide range of matrices A; more on this remarkable universality phenomenon 
the interested reader can find in [33].) 

Since the RIP condition played a crucial role in proving technique of [14, 15] having the matrix A 
satisfy the RIP condition is fundamentally important. (More on the importance of the RIP condition can 
be found in [13]). Designing deterministic matrices for which the RIP condition would hold as well as 
checking if it holds for any given matrix is a very hard problem. However, for several classes of random 
matrices (e.g., matrices with i.i.d. zero mean Gaussian, Bernoulli or even general Sub-gaussian components) 
it turns out that for certain dimensions of the system the RIP condition is satisfied with overwhelming 
probability [1, 5, 15, 73]. On the other hand, it should also be pointed out that the RIP is only a sufficient 
condition for l\ -optimization to produce the solution of (Q]). In turn this means that an analysis of l\- 
optimization success is not required to rely on it. 

In fact, the final results and brilliant analysis of [27,28] do not rely on the validity of the RIP condition. 
Namely, in [27, 28] the author considers polytope obtained by projecting the regular iV-dimensional cross- 
polytope using the matrix A. It turns out that a necessary and sufficient condition for §2$ to produce the 
solution of dl]) for any given x is that this polytope associated with the matrix A is K-neighborly [27-30]. 
Using the results of [2, 10, 72, 90], it is further shown in [28], that if the matrix A is a random m x n 
ortho-projector matrix then with overwhelming probability polytope obtained projecting the standard N- 
dimensional cross-polytope by A is ^-neighborly. The precise relation between M and K in order for this 
to happen is characterized in [27, 28] as well. 

It should be noted that one usually considers success of (0 in finding solution of (Q~|) for any given x. 
It is also of interest to consider success of © in finding solution of (Q~|) for almost any given x. To make a 
distinction between these two cases we will in the following section recall on several important definitions 
from [28,29,31]. 

Before proceeding further we first in the following section introduce the so-called block-sparse signals 
that will be the central topic of this paper. Immediately afterwards we also describe a polynomial algorithm 
for their efficient recovery. 

2 Block-sparse signals and £2 / h -algorithm 

What we described in the previous section is the standard compressed sensing setup. Such a setup does not 
assume any special structure on the unknown if-sparse signal x. However one may encounter applications 
when the signal x in addition to being sparse has a certain structure. The so-called block-sparse signals were 
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introduced and its applications and recovery algorithms were investigated in [4, 17,38-40,44,65,78,81,83]. 
A related problem of recovering jointly sparse signals and its applications were considered in [6, 9, 1 8, 23, 4 1 , 
61,64,85,87,89,91,97,98] and references therein (more on different types of a priori known signal structure 
can also be found in [55, 56, 96]). In all these cases one attempts to improve the recoverability potential 
of the standard algorithms described in the previous section by incorporating the knowledge of the signal 
structure. 

In this paper we will be interested in further investigating the so-called block-sparse compressed sensing 
problems [4,40, 65, 78, 81, 83]. To introduce block-sparse signals and facilitate the subsequent exposition 
we will assume that integers N and d are chosen such that n = is an integer and it represents the total 
number of blocks that x consists of. Clearly d is the length of each block. Furthermore, we will assume 
that m = ^j- is an integer as well and that Xj = X( i _ 1 ) d+1: j d , 1 < i < n are the n blocks of x (see Figure 
[2]). Then we will call any signal x k-block-sparse if its at most k = 4- blocks Xj are non-zero (non-zero 
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Ai — columns 1,2,..., d 

A, — columns id — d + 1, id — d + 2, . . . ,id 

A„ — columns nd — d + 1, nd — d + 2, . . . , nd 
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Figure 2: Block-sparse model 
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block is a block that is not a zero block; zero block is a block that has all elements equal to zero). Since 
A; -block-sparse signals are if-sparse one could then use Q to recover the solution of £[]). While this is 
possible, it clearly uses the block structure of x in no way. To exploit the block structure of x in [83] the 
following polynomial algorithm (essentially a combination of £2 and t\ optimizations) was considered (see 
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also e.g. [4,39,89,97,98]) 



min H X («-l)d+l:i<ill2 
1=1 

subject to Ax = y. (3) 

Extensive simulations in [83] demonstrated that as d grows the algorithm in ([3]> significantly outperforms the 
standard l\. The following was shown in [83] as well: let A be an M x N matrix with a basis of null-space 
comprised of i.i.d. Gaussian elements; if a = j& — ► 1 then there is a constant d such that all -block-sparse 
signals x with sparsity K < (3N, (3 — > \, can be recovered with overwhelming probability by solving ([3]). 
The precise relation between d and how fast a — ► 1 and j3 — ► i was quantified in [83] as well. In [78, 8 1] 
we extended the results from [83] and obtained the values of the recoverable block-sparsity for any a, i.e. 
for < a < 1. More precisely, for any given constant < a < 1 we in [78, 81] determined a constant 
ft = jf such that for a sufficiently large d © with overwhelming probability recovers any -block-sparse 
signal with sparsity less then K. (Under overwhelming probability we in this paper assume a probability 
that is no more than a number exponentially decaying in N away from 1.) 

Clearly, for any given constant a < 1 there is a maximum allowable value of the constant (3 such that 
([3]) finds solution of (Q} with overwhelming probability for any x. This maximum allowable value of the 
constant j3 is called the strong threshold (see [27,28]). We will denote the value of the strong threshold by 
(3 S . Similarly, for any given constant a < 1 one can define the sectional threshold as the maximum allowable 
value of the constant (3 such that (O finds the solution of (Q~|) with overwhelming probability for any x with 
a given fixed location of non-zero blocks (see [27, 28]). In a similar fashion one can then denote the value 
of the sectional threshold by p sec . Finally, for any given constant a < 1 one can define the weak threshold 
as the maximum allowable value of the constant j3 such that ^ finds the solution of (Q} with overwhelming 
probability for any x with a given fixed location of non-zero blocks and given fixed directions of non-zero 
block vectors Xj (see [27,28]). In a similar fashion one can then denote the value of the weak threshold by 

While [78, 81] provided fairly sharp threshold values they had done so in a somewhat asymptotic sense. 
Namely, the analysis presented in [78,81] assumed fairly large values of block-length d. As such the analysis 
in [78, 81] then provided an ultimate performance limit of £2/^1 -optimization rather than its performance 
characterization as a function of a particular fixed block-length. In this paper we extend the results from 
[78, 81] so that the threshold values are now functions of a fixed block-length d. Our analysis will use 
some ingredients of the analysis presented in [78, 81]. However, significantly more precise estimates of 
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certain quantities will be necessary to account for a fixed block-length. These estimates will be obtained 
in a fashion similar to the one presented in [82]. In addition to the strong thresholds (which were the 
main concern of [78, 81]), we will also determine attainable values for the sectional and weak thresholds as 
functions of a fixed block-length d for the entire range of a, i.e. for any < a < 1. 

We organize the rest of the paper in the following way. In Section [3] we introduce two key theorems that 
will be the heart of our subsequent analysis. In Section [4] we determine the values of the strong, sectional, 
and weak thresholds for a given block-length d under the assumption that the null-space of the matrix 
A is uniformly distributed in the Grassmanian. In Section [5] we determine the asymptotic values of the 
strong, sectional, and weak thresholds assuming large block length d. In Section [6] we present the results 
of the conducted numerical experiments and finally, in Section |7] we discuss obtained results and possible 
directions for future work. 

3 Null-space and escape through a mesh theorems 

In this section we introduce two useful theorems that will be of key importance in our subsequent analysis. 
First we recall on a null-space characterization of the matrix A which establishes a guarantee that the solu- 
tions of (fl]) and © coincide. The following theorem from [78, 81, 83] provides this characterization. Set K 
to be the set of all subsets of size k of {1, 2, . . . , n}; also if k C K, then n c = {1, 2, . . . , n} \ k. 

Theorem 1. ( [83]) Assume that A is a dm x dn measurement matrix, y = Ax. and x is k-block-sparse. 
Then the solutions of (0) and (UJ coinside if and only if for all nonzero w £ R dn where Aw = and all 

SllWiHa < £||Wi||2 (4) 

where = (w (i _ 1)d+1 , w (i _ 1)d+2 , . . . , w id ) T , i = 1,2, ...,n. 
The following three remarks seem to be in order. 

Remark 1: The following simplification of the previous theorem is also well-known. Let w G R ra be 
such that Aw = 0. Further, let W( norTO ) = (||Wi||2, HW2H2, • • • , ||W n ||2) T and let |W( norm )|m be the 
i-th smallest of the elements of W (norm) . Set W = (|W (norm) | (1) , |W (norm) | (2) , . . . , | W (norm) | (n) ) T . If 
(Vw|^w = 0) E"=n-fc+i Wj < YnZi Wj, where W, is the i-th. element of W, then the solutions of © 
and (O coincide. 

Remark 2: Characterization given in the previous theorem (and proven in [83]) is a mere analogue to 
the similar characterizations related to the equivalence of (fl} and (O from e.g. [32,35,42,57,80,83,95,99]. 
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If instead of l\ one, for example, uses an i q -optimization (0 < q < 1) in then characterizations similar 
to the ones from [32,35,42,57,83,95,99] can be derived as well [48-50]. In a similar fashion one could 
then derive an equivalent to the previous theorem for the liftq -optimization, < q < 1. 

Remark 3: Checking if the condition given in the above theorem is satisfied for a given matrix A is a 
very important and difficult problem. Although it is not the main topic of the present paper, we do mention in 
passing that a possible approximate way of solving it would be a generalization of results from e.g. [25,54]. 

Clearly, if one can construct the matrix A such that (@| holds then the solution of ([3]) would be the 
solution of (Q]). If one assumes that m and k are proportional to n (the case of our interest in this paper) then 
the construction of the deterministic matrices A that would satisfy (O is not an easy task. However, if one 
turns to random matrices this appears to be significantly easier. In the following sections we will show that 
this is indeed possible for a particular type of random matrices. 

More precisely, as we have already hinted earlier, we will consider the random matrices A that have 
the null-space uniformly distributed in the Grassmanian. The following phenomenal result from [47] that 
relates to such matrices will be one of key ingredients in the analysis that will follow. 

Theorem 2. ( [47] Escape through a mesh) Let S be a subset of the unit Euclidean sphere S^™" 1 in R dn . 
Let Y be a random d(n — m)-dimensional subspace of R dn , distributed uniformly in the Grassmanian with 
respect to the Haar measure. Let 

w(S) = £sup(h T w) (5) 

wes 

where h is a random column vector in R dn with i.i.d. M(0, 1) components, w is a dn-dimensional column 
vector from S, and h T is the transpose ofh . Assume that w(S) < (y/dm — ^ dm ) ■ Then 

P(Y HS = 0) > 1 - 3.5e is . (6) 

Remark: Gordon's original constant 3.5 was substituted by 2.5 in [74]. Both constants are fine for our 
subsequent analysis. 

4 Probabilistic analysis of the null-space characterizations 

In this section we probabilistically analyze validity of the null-space characterization given in Theorem Q] 
In the first subsection of this section we will show how one can obtain the values of the strong threshold j3 s 
for the entire range < a < 1 based on such an analysis. In the later two subsections we will extend the 
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strong threshold analysis and obtain the values of the sectional and weak thresholds. 
4.1 Strong threshold 

As masterly noted in [74] Theorem[2]can be used to probabilistically analyze ((U) (and as we will see later in 
the paper, many of its variants). Namely, let S in (f5]) be 

n n—k 

S s = {weS dn ~ 1 \ Wi^Wi} (7) 

i—n—k+X i=l 

where as earlier the notation W is used to denote the vector obtained by sorting the elements of W( norm ) 
in non-decreasing order (essentially, W is a vector obtained by sorting magnitudes of blocks Wj in non- 
decreasing order). Also, here and in an analogous fashion in the later sections of the paper, we assume that 
k is such that there is an a, < a < 1, such that the solutions of £lj and © coincide. Let Y be a d(n — m) 
dimensional subspace of R dn uniformly distributed in Grassmanian. Furthermore, let Y be the null-space of 
A. Then as long as w(S s ) < y/dm — ^ d \ , Y will miss S s (i.e. (J4J) will be satisfied) with probability no 
smaller than the one given in ©. More precisely, if a = ^ is a constant (the case of interest in this paper), 
n, m are large, and w(S s ) is smaller than but proportional to y/dm then P(Y n S s = 0) — > 1. This in turn 
is equivalent to having 

n n—k 

P(Vw E R dn \Aw = 0, Wi<J] W f ) — ► 1 

i=n— fe+1 i=l 

which according to Theorem [T] (or more precisely according to remark 1 after Theorem [D means that the 
solutions of (Q]) and ® coincide with probability 1. For any given value of a G (0,1) a threshold value of 
(i can then be determined as a maximum (3 such that w(S s ) < dm — 4 J^— J- That maximum (3 will 
be exactly the value of the strong threshold j3 s . If one is only concerned with finding a possible value for 
P s it is easy to note that instead of computing w(S s ) it is sufficient to find its an upper bound. However, to 
determine as good values of j3 s as possible, the upper bound on w(S s ) should be as tight as possible. The 
main contribution of this work will be a fairly precise estimate of w(S s ). 

In the following subsections we present a way to get such an estimate. To simplify the exposition we first 
set w(h, S s ) = max we s s (h T w). In order to upper-bound w(S s ) we will first in Subsection l4. 1. ll determine 
an upper bound B s on w(h, S s ). The expected value with respect to h of such an upper bound will be an 
upper bound on w(S s ). In Subsection 14. 1 .21 we will compute an upper bound on that expected value, i.e. we 
will compute an upper bound on E{B S ). That quantity will be an upper bound on w(S s ) since according to 
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the following E(B S ) is an upper bound on w(S s ) 

w(S s ) = Ew(h, S a ) = £(max(h T w)) < E{B„). (8) 

we s s 

4.1.1 Upper-bounding w(h, S s ) 

Let H.j = (h(j_ 1)(i+1 , h(i_i) d+ 2, . . . , h i(i ) T , i = 1,2, ... ,n. From the definition of set 5 S given in © 
it easily follows that if w is in S s then any vector obtain from w by rotating (essentially multiplying by 
orthogonal matrices) any subset of its blocks Wj, 1 < i < n, in any direction is also in S s . The directions 
of vectors W», 1 < i < n, can therefore be chosen so that they match the directions of vectors Hj, 1 < i < n 
of the corresponding blocks in h. We then easily have 

n n 

w(h,S s ) = max(h T 'w) = max |hjWj| = max ||Hj||2||Wj||2. (9) 
wgs s wes s ^-^ wes s ^-^ 

Let H( norm ) = (|| Hi || 2, || H2 1| 2 3 • • • 1 1 1 Hn 1 1 2 ) - Further, let |H( norm )|(j) be the z-th smallest of the elements 
of H (norm) . Set H = (|H (norm) | {1) , |H( norm) |( 2 ), . . . , |H (norm) | {n) ) T . If w G S s then a vector obtained by 
permuting the blocks of w in any possible way is also in S s . Then © can be rewritten as 

n 

w{h,S s ) = maxV HiHWiHa (10) 

wes s 

where Hj is the z-th element of vector H. Let w be the solution of the maximization on the right-hand side of 
CE0]>. Further let W, = (w (i _ 1)d+1 , W(j_i) d+2 , • • • , ™id) T ,i = 1,2, ...,n. It then easily follows ||W„|| 2 > 
||Wn-l||2 > ||"VV" ri _2 1| 2 > • • > ||Wi||2. To see this assume that there is a pair of indexes rii,ri2 such that 
n x < n 2 and ||W ni || 2 > ||W n2 || 2 . However, || W ni || 2 H, tl + || W„ 2 || 2 H„ 2 < || W n2 || 2 H ni + || W ni || 2 H n2 
and w would not be the optimal solution of the maximization on the right-hand side of (fTOl ). 
Let y = (yi 

> Y2) • • • 1 yn) T S R n - Then one can simplify (1101) in the following way 

n 

w(h, S s ) = max ^ H^yj 

i=l 

subject to yi > 0, < i < n 

n n—k 

i=n—k+l i=l 
n 

i=i 
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One can add the sorting constraints on the elements of y in the optimization problem above. However, 
they would be redundant, i.e. any solution y to the above optimization problem will automatically satisfy 
fn > Yn-i > • • • > yi- To determine an upper bound on w(h, S s ) we will use the method of Lagrange 
duality. The derivation of Lagrange dual upper bound will closely follow a similar derivation from [82]. For 
the completeness we reproduce it here as well. Before deriving the Lagrange dual we slightly modify (TTTb 
in the following way 

n 

- w(h, S s ) = min -Y^Hjyj 
ye/?™ f-f 

1=1 

subject to yi > 0, < i < n 

n n—k 
i=n— fc+l «=1 



X>*^ L d2) 



i=l 

To further facilitate writing let z G R n be a column vector such that Zj = 1, 1 < i < (n — k) and 
Zj = — 1, n — k + 1 < i < n. Further, let A = (Ai, A2, • • • , A ra ) T G R n . Following, e.g. [11], we can write 
the dual of the optimization problem (fl2l and its optimal value w up (h, S s ) as 

- w up (h, S s ) = max min -H T y + iWyWl ~ 7 + ~ ^ T y 
y 

subject to v > 0, 7 > 

Xi>0,0<i<n. (13) 

One can then transform the objective function in the following way 

/| c 1 \ 11 /- A + H — uz 9 ||A + H — i/z||| 

-w up (h,S s ) = maxmm ^y —— 2 - 7 

7,!/,A y 2^/7 47 

subject to v > 0, 7 > 

Xi>0,0<i<n. (14) 

After trivially solving the inner minimization in ([T4l we obtain 

/i q \ - , ||A + H-t/z|l2 
w; up (h, 6 S ) = mm 7 H 

7,j/,A 47 

subject to v > 0, 7 > 



Ai>0,0<i<n. (15) 
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Minimization over 7 is straightforward and one easily obtains that 7 = ^ A+H 2 uz ^ 2 is optimal. Plugging this 
value of 7 back in the objective function of the optimization problem (PT5l) one obtains 



w up (h,S s ) = mm ||A + H — uz\\ 2 
subject to v > 

Xi>0,0<i<n. (16) 

By duality, — w up (h, S s ) < — w(h, S s ) which easily implies w(h, S s ) < w up (h, S s ). Therefore w up (h, S s ) 
is an upper bound on w(h, S s ). (In fact one can easily show that the strong duality holds and that w(h, S s ) = 
w up (h, S s ); however, as explained earlier, for our analysis showing that w up (h, S s ) is an upper bound on 
w(h,S s ) is sufficient.) Along the same lines, one can easily spot that any feasible values v and A in ( fT6l ) 
will provide a valid upper bound on U7 up (h, S s ) and hence a valid upper bound on w(h, S s ). In what follows 
we will in fact determine the optimal values for u and A. However, since it is not necessary for our analysis 
we will not put too much effort into proving that these values are optimal. As we have stated earlier, for our 
analysis it will be enough to show that the values for v and A that we will obtain are feasible in (fT6l ). 

To facilitate the exposition in what follows instead of dealing with the objective function given in ([TBI 
we will be dealing with its squared value. Hence, we set /(h, v, A) = ||A + H — vz,\\^. Now, let A = 
(Ai, A 2 ,... , A c ,0,0, ... ,0) T , Ai > A 2 > ••• > A c > where c < (n — k) is a crucial parameter that 
will be determined later. The optimization over v in (fTBl is then seemingly straightforward. Setting the 
derivative of /(h, v, A) with respect to v to zero we have 

djjA + H-gzjjf = Q 
dv 

^ -2(A + H) T z + 2||z||^i/ = 

« , = (*;f 2 > T °. <17) 

IMI! 

If (A + H) T z > then 

v = ^t^l z is indeed the optimal in ([TBI . For the time being let us assume that 
T " z " 2 - 
A, h, c are such that v = ^**) z > 0. For v = ^t^l z we have 

/(h, (A |f 2 )Tz , A) = ||(A + HH/ - -fr-)||! = (A + H) T (/ - ^)(A + H). (18) 
Z o z 1 z z 1 z 
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Simplifying (fT8l) further we obtain 



z 9 f— i f— i Z — ' n n n 

ii 



To determine good values for A we proceed by setting the derivatives of /(h, ^**) z with respect to 



T 

112 

A, ; , 1 < i < c to zero 



tf(h,h^>_ 2Ai + 2ft _ 2 (EkM_ 2 (H^ =0 . (20) 



d\i n n 



Summing the above derivatives over i and equalling with zero we obtain 



i=l ' i=l i=l 

From (1211 ) one then easily finds 



n^-^"'. (22) 

n — c n — c 

i=l 

Plugging the value for J2i=i A« obtained in (|22l) in (1201) we have 

At _(H T z) fi | (E- = iA2) _(H T z) ^ | c(H T z) ELl Hi 



n n n nin — c) n — c 

and finally 

= (H r z) - ELi ft _ Hj, 1< i < e 

n — c 

A; = 0, c + 1 < i < n. (23) 
Combining (flTT ) and ((221 we have 



(A + Hf z _ H^z + ELi A, _ H r z + ^ - ^fe^ _ (H^z) - £* = 1 H, 



n n n — c 



(24) 



From (1231 we then have as expected 

v = Xi + Hi, 1 < i < c. (25) 



As long as we can find a c such that Aj , 1 < i < c given in (1231) are non-negative v will be non-negative as 
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well and v and A will therefore be feasible in (PT6l) . This in turn implies 



(h, S s ) < y/f(h,V,\) 



(26) 



where /(h, v, A) is computed for the values of A and v given in (l23l and (l25l) . respectively. (In fact deter- 



mining the largest c such that Aj , 1 < i < c given in (l23l are non-negative will insure that yf /(h, z/, A) = 
it? (h, S s ); however, as already stated earlier, this fact is not of any special importance for our analysis). 
Let us now assume that c is fixed such that A and v are as given in (l23l and (|25T ). Then combining ( fl9l ), 
|>, and (l25l) we have 



(\J-il\T„ n c c c c 

/(h, (A + f 2 } ^ A) = ^Hf+2^H t -2^H t 2 + c^-2^H, + ^Hf 



(Ei=i A, + H T z) 2 



i=l 



i=l i=l 



Combining (1221 and (1241 we obtain 



nv. 



i=l 



Further, combining d27l and d28l we find 



(27) 



(28) 



(A + Hfz 

/( h > — n T72 — ' A ) 



(nuY 



i=i 



i=l 
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£H 2 + (c-n), 2 -£H 2 



i=i 

n 



i=l 



E H *E H 



2 ((H T z) - YTi=i Hi 



i=i 



i=l 



re — c 



(29) 



Finally, combining (1261 ) and d29l ) we have 



i»(h,S,) < 



\J i=l 



((H r z)-£LiH 4 ) 2 



i=l 



n — c 



Eh? 



((H^z)-ELiH,) 2 



i=c+l 



n — c 



(30) 



Clearly, as long as (H T z) > there will be a c < n — k (it is possible that c = 0) such that quantity on the 
most right hand side of (l30l) is an upper bound on w(h, S s ). 

To facilitate the exposition in the following subsection we will make the upper bound given in (l30l) 
slightly more pessimistic in the following lemma. 

Lemma 1. Let h G R dn be a vector with i.i.d. zero-mean unit variance gaussian components. Let Hj = 



( h (i-l)d+i> h(i-i)d+2> • • • ! hid) T , i — 1,2,... , re and H( norm ) 



Hi 2j H2 2j 



IH 



n 2 



Further, 
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let |H( norm) |(j) be the i-th smallest of the elements ofH( norm y Set H = (|H (norm) | (1) , |H (norm) | (2 ), . . . , 
|H( norm )|( n )) T and w(h, S s ) = max wg s s (h T w) where S s is as defined in 0. Let z G R n be a column 
vector such that Zj = 1, 1 < i < (n — k) and Zj = —1, n — k + 1 < i < n. Then 

w{h,S s )<B s (31) 



where 



B, 



Td=l H? if C s (h,c s )<0 



. fe Cs+1 H?-^5^ if C s (h,c s )>0 
Cs(h, c) = — - — H c and c s = 5 s n is a c < n — k such that 



(32) 



n — c Xd \ n 



^— ^ = 0. (33) 



F Xd (•) jj f/ie inverse cdf of the chi random variable with d degrees of freedom, i.e. it is the inverse cdf 



of random variable y Yli=i %f where Z{, 1 < i < d are independent zero-mean, unit variance Gaussian 
random variables, e > is an arbitrarily small constant independent ofn. 

Proof. Follows from the previous analysis and (l30l ). □ 
4.1.2 Computing an upper bound on E(B S ) 

In this subsection we will compute an upper bound on E{B S ). Again, the derivation will closely follow that 
of [82]. (However, due to a few block-structure related differences in the derivations of Lemmas [2] and [3] we 
include it here.) As a first step we determine a lower bound on P(C s (h, c s ) > 0). We start by a sequence of 
obvious inequalities 

P(C,(h, C) > 0) > P ( «h, c.) > - F-l Al + *. ' 



n — c s /l " \ n 

> P ( «* T *) - m go > a - ^«^> - a, ft) md f(i±*. , > H , 

\ n — c s n — c s Xd \ n 

>! p A(H T z)-E-=iHQ , (l-^((H T z)-E-=iH,) \ p / F _ 1 /(1 + eK , <R 
I n — Co n — c, / \ Xd \ n 



(34) 
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The rest of the analysis assumes that n is large so that 5 S can be assumed to be real (of course, 5 S is a 
proportionality constant independent of n). Using the results from [7] we obtain 



Xd 



n 



n 



i(l+e)c 



< exp 



ne 2 5 s 
2(1 + e) 



c s (1 + e)c s 



£ \ n 



n 



(35) 



We will also need the following brilliant result from [20]. Let £(•) : R dn — ► i? be a Lipschitz function such 
that |£(a) — £(b)| < <r||a — Tt> 1 1 2 - Let a be a vector comprised of i.i.d. zero-mean, unit variance Gaussian 
random variables. Then 



P((l - e)^(a) > £(a)) < exp 



2a 2 



(36) 



Let £(h) = (H T z) — YliLi Hi- The following lemma estimates a (for simplicity we assume c s = 0; the 
proof easily extends to the case when c s 7^ 0). 

Lemma 2. Let&,h G L<??A; = (a (i _ 1)(i+1 , a (i _ 1)d+2 , . . . , a id ) WBi = (b (i _i )d+ i, h^_ l)d+2 , . . . , b id ) 
i = 1, 2, . . . , n. Set A( norm ) = (|| Ai H2, || A2II2, • • • , ||A n ||2) andH( norm ) = (||Bi||2, HB2II2, • • • , ||B n ||2). 
Further, let \ A.( n orm)\u)i |B( norm ) l(i) he the i-th smallest of the elements of A( norm ), B( norm ), respectively. 
Set A = (lA^o^l^), |A( norm )|( 2 ), . . . , |A( norm )|( n )) T a?iJB = (|B( norm )|(!), |B( norm )|( 2 ), - - - , |B( norm )|( n 
Then 



n—k 



n—k 



ie(a)-e(b)i = i^A 4 - J2 a,-^b,+ *i\<^< 

i=l n— fc+1 i=l n— fc+1 



dn 

E 



V^||a-b|| 2 . (37) 



Proo/ We have 



n— fe n n— fc n n—k n 

^A 4 - ^ A,-^B J+ £ B 4 |<|^(A,-B,)| + | 2 ( A *" B *)I 

i=l i=n— fc+1 i=l i=n— fc+1 i=l i=n— fe+1 



n— fe 



< ^ |A £ - B,| +- lAi-Bil ^J^IAi-Bil < 

i=n— fc+1 



i=l 



i=l 



\ i=i 



< Ei a *i 2+ Ei b <i 2 - 2 E a * b < = vS 

\ i=l i=l 



i=l 



dn 



dn 



v E w 2 + E i b *i 2 - 2 E A * B * ( 38 ) 
\ i=i 



i=l 



i=l 
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Since the components of A and B are positive and sorted in the same non-decreasing order we have 



2- 



i=l 



i=l 



By Cauchy-Schwartz inequality we have 



n d 



dn 



l (i-l)d+j^°(i-l)d+j 



i=i 



i=l i=i 



i=i 



From d39l and d4Ql > we obtain 



Combining (I38T ) and (l4Tb we finally have 



dn 



i=l 



i=l 



(39) 



(40) 



(41) 



n—k n n—k n 

E Ai - E ^ + E < v /? 

i=l i=n— fc+1 i=l i=n— fe+1 



dn 



v Ei^i 2 + Ei b ^- 2 E A ^ 
\ i=i 



i=l 



i=l 



dn. 



dn 



dn 



a ^ |aj| 2 + ^ |bi| 2 - 2^ajbj = i/n A 
\ i=i 



i=l 



i=l 



dn 



\ i=l 



Connecting beginning and end in (1421) establishes 071 ). 



(42) 



□ 



For £(h) = (H T z) — 5Zi=i H« the previous lemma then gives c < -y/n (in fact if there was no assump- 



tion that c s = one would rather handily obtain a < ^Jn — c s by merely recognizing that the length of all 
relevant vectors would be a < sjn — c s instead of n). As shown in [77] (and as we will see later in this 
paper), if n is large and 5 S is a constant independent of n, E((H. T z) — YliLi Hi) = ^ s n where tp s is inde- 
pendent of n as well (ip s is of course dependent on (3 and 5 S ). Hence (I36T ) with £(h) = (H T z) — 5Zi=i Hi 
gives us 



P 



((H T z) - E-=i H<) Al- e)S((H T z) - Eti H, 



< 



n - c. 



n - c. 



< exp 



(e0 s n) 5 



2n 



exp 



(43) 
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Combining (O, <E2]), and (03]) we finally obtain 



p(C,(h,c) > o) > i - p ( « ftTz) < d-«)g((ft r »)-s& 1 ft 



n - c s 



> 1 — exp 



n - c. 



ne 2 S s 
2(1 + e) 



exp 



e 2 ip 2 n 



(44) 



We now return to computing an upper bound on E(B S ). By the definition of B s we have 



E(B S ) 



(h,c s )<0 ^ 



^H2p(h)dh + 



8=1 



C s (h,c s )>0 ^ 



i=c s +l 



((H^z)-E-=iHi) 2 



n - c. 



p(h)dh (45) 



where p(h) is the joint pdf of the i.i.d. zero-mean unit variance gaussian components of vector h. Since the 



functions \/Ya=1 ^? anc ^ P(h) 316 rotationally invariant and since the region £ s (h, c s ) < takes up the 
same fraction of the surface area of sphere of any radius we have 



/ 



(h,c s )<0 \ 



Hfp(h)dh = E 



8=1 



J^H? / P(h)dh< 

i=1 J( s (h,c a )<0 \ 



eVH 8 2 / p(h)dh. (46) 

i=1 ic s (h,c s )<o 



Combining (|44l> and (1461 ) we further have 



C s (h,c s )<0 ^ 



H?p(h)dh 



i=l 



i=l 



exp 



ne 2 8 s 
2(1 + 6) 



+ exp 



e 2 i> 2 s n 



(47) 



It also easily follows 



G(h,c s )>0 ^ 

= £ 



i=c 3 +l 



n - c. 



E ^ 



((H^-EttH,) 2 



i=c s +l 



n - c. 



p(h)dh 



\ 



E a? 



((H^z) - Eti Hi) 2 



i=c 3 +l 



n - c. 



< 



E ±^. (E&T.)-ET&6<y>_ (48) 



i=c»+l 



n - c. 



Finally, combining (I43T ). (|47T ). and (l48l) we obtain the following lemma. 
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Lemma 3. Assume the setup ofLemma\l\ Let further ij) s = — n — -Then 



i-( f 1 

E{B S ) < yjn I expj - + j + exp 



e 2 tp 2 s n 



+ 



i 



i=c 3 +l 



rt - c. 



(49) 



Proof. Follows from the previous discussion. □ 

If n is large the first term on the right hand side of d49l goes to zero. In a fashion similar to the one 
presented in [82] from (O, ([8]), and d49l) it then easily follows that for a fixed a one can determine (3 S as a 
maximum /3 such that 

^gg^.wgjo-gskM. (50) 

n n(n — c s ) 

As earlier k = (5n and z € i? n is a column vector such that Zj = 1, 1 < i < (n— A;) and z, = — 1, n— fc+1 < 
i < n (/? is therefore hidden in the above equation in z). As in [82], finding j3 s for a given fixed a is 
equivalent to finding minimum a such that ( f50b holds for a fixed j3 s . Let /3™ aa: be /3 S such that minimum a 
that satisfies (f50l > is 1. Our goal is then to determine minimum a that satisfies d50l ) for any j3 s G [0, /3™ ax ]. 

In the rest of this subsection we show how the left hand side of (f50l > can be computed for a randomly 
chosen fixed j3 s . As in [82] we do so in two steps: 

1. We first determine c s 

2. We then compute lim^oo ( gE ^+i H ' _ (E(H T z)-£Ep H a ) 2 \ ^ ^ found fa ^ L 



n(n.-c s ) 

Step i: 

From Lemma[T]we have c s = <5 s ra is a c such that 

(1 ~ ggggg H t - ELn-An+l Hi) ~ Ei=l Hi) _ F _! / (1 + Qc 

n — c Xd \ n 

° F ^ v — « — ) 

where as in Lemma[T]Hj = |H( nor m)l(i) an d | H(norm) I (*) * s ^ e ^~ tn smallest magnitude of blocks Hj of h. 
We also recall that h E R dn is a vector with i.i.d. zero-mean unit variance Gaussian random variables and 
e > is an arbitrarily small constant. Set 9 S = 1 — 6 S . Following [8,77] we have 

Km gg^zWgj = f tdFxM (52) 
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n— >oo n 



where we recall that F Xd {-) is the cdf of any of 1 1 1 1 2 - Clearly, ||Hj||2 is a chi-distributed random variable 
with d degrees of freedom. We then have for its pdf 



1 d 

Z 2 +d-l - l — 



dF y ,(t) = — ^-r- i e"^,t>0 (53) 
where T(-) stands for the gamma function. The following integration then gives us F~ l (l — 8 S ). Namely, 



— -3- / 1T~ l e~dt = 1 

r(f ) Jo 

d~ 



F£{l-e B ) = ^2>y£(l-e a ,-) (54) 

where 7^,(1 — S , |) stands for the inverse of the incomplete gamma function with | degrees of freedom 
evaluated at (1 — 9 S ). We further then find 



tdF y At) = — -7- / t d e-^dt = - M-^ l-7inc(^ J ^ — ,-) (55) 

*£<i-'.) XdU r(i ) r(j) V 7mcl 2 '2^ 

where 7j nc ( — 5 — - — , 5) stands for the incomplete gamma function with % degrees of freedom evaluated 
at {Fxd (1 2 ~ es)) . From (f54]> and <[55l) we obtain 



Combination of (f5TT > and d56l ) produces 

^E? = (i- fl .)n + iHi v^r(^) / x d d + i 

= r( d } " W7 mc (l -^3). —3—) ) ■ 

In a completely analogous way we obtain 

= ^ (1 " 7 mc (7 mc (l - A, 5), — ) ] • (58) 



20 



Similarly to (l54l ) we easily determine 



_ / t d ^e~dt=(l + e)6 s 



=► + e)6 s ) = ^^((l + e )tf„ f?) = ^2rii((l + - 5) (59) 

Combination of (I5TI ). (1571) . (T58T ). and (1591 gives us the following equation for computing 6 S 

f d+l ' 



((1 - lUlglil - 9s, |), ^)) - 2(1 - 7^(7^(1 " Ps, |), U _ x ^ ^ ^ ^ 



(-1-0—^ J s V2 7 rt((l + e)(l-0 s ),|) 

(60) 

Let 9 S be the solution of (l60l) . Then <5 S = 1 — # s and c s = 5 s n = (1 — 9 s )n. This concludes step 1. 
Step 2: 

In this step we compute lim n _+oo ^ E ^ +lH ' - ^^^^f 1 H ' )2 j with c s = (1 - s )n. Using 
the results from step 1 we easily find 



n(n - c s ) S 

(61) 

Effectively, what is left to compute is lim^oo '^ 3+1 — L . Using an approach similar to the one used in 

step 1 and following [8,77] we have 

lim ^gkzgWg = r tdF At ) (62) 



X 



d 



where F x 2 d {') is the cdf of the chi-square random variable with d degrees of freedom and naturally ^,2 0) 
is the inverse cdf of the chi-square random variable with d degrees of freedom. We then have 

_ i 

dF*(t) = — ?-t7~ 1 e~*,t> (63) 
where as earlier T(-) stands for the gamma function. The following integration then gives us F~2(l — 9 S ). 

^-d 
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Namely, 



d 



2~2 



d i c 

t2- l e ~zdt = 1 



r(|) Jo 

F£(l-O a ) = 2 7 ^(1-0„|), (64) 



where as earlier 7j n <,(/, •) is the inverse incomplete gamma function. We then find 



^ 2-! r , +2 x * „ 2r(^) / (F 2 \1-9 S ) d + 2 



^d-i.) Xd r(f)^- 1(1 -^) r(f) I Imc \ 2 ' 2 

(65) 

where as earlier 7mc(', •) stands for the incomplete gamma function. From (l64l and (|65T ) we obtain 



f 

Jf 



Combination of (l62l and (l66l > produces 



^(*) = (l " 7^(7^(1 " L 5), ^)) - (66) 



= -^jj- [ l " 7mc(7 fac (l " Os, -), — ) j • (67) 

We summarize the results from this section in the following theorem. 

Theorem 3. (Strong threshold) Let Abe a dm x dn measurement matrix in (TJJ) with the null-space uniformly 
distributed in the Grassmanian. Let the unknown x in ([7]) be k-block- sparse with the length of its blocks d. 
Let k, m, n be large and let a = ^ and (3 S = ^ be constants independent of m and n. Let "fi nc (-, •) be the 
incomplete gamma function and let 7 Z C (-, ■) be the inverse of the incomplete gamma function. Further, let 
e > be an arbitrarily small constant and 9 S , (/3 S < 6 S < 1) be the solution of 



^fif 1 ((1 " 7fac(7ii(l " Os, I), ^)) " 2(1 " 7mc(7rnc(l " A, |), / 

■/2^ c ((l + e )(l-^),|) 

(68) 
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If a and (5 s further satisfy 



2Y(*¥) ( d d + 2. 

> r (rf) y- ~ Tinc(7 <w .(i - e s, g), — g— . 

(C 1 - W7^(l - 5., I), - 2(1 - 7<nc(7^(l " ft, g), 



(69) 



f/jerc solutions of ([7]) owe? (0) coincide with overwhelming probability. 

Proof. Follows from the previous discussion combining ©, ®, OB, (@2]>, d2)]>, ([60]), ([6T]>, and ([67]>. □ 

The results for the strong threshold obtained from the above theorem for different block-lengths d are 
presented on Figure [3] The case of large d was considered in [78, 81] and is given for comparison as 
d — » oo on Figure [3] as well. (In Section [5] we will show how the results given in [78, 81] follow from the 
above presented analysis.) Increasing the block-length introduces so to say more structure on the unknown 
signals. One would then expect that recoverable thresholds should be higher as d increases. Figure [3] 
hints that £2/^1 -optimization algorithm from ([3]) possibly indeed recovers higher block-sparsity as the block 
length increases. 



Block-sparse strong thresholds as a function of block length d 




Figure 3: Block-sparse strong thresholds as a function of block-length d; -optimization 
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4.2 Sectional threshold 

In this subsection we determine the sectional threshold [3 sec . Before proceeding further we quickly recall 
on the definition of the sectional threshold. Namely, for a given a, j3 sec is the maximum value of j3 such 
that the solutions of (Q]) and (0) coincide for any given /3ra-block-sparse x with a fixed location of nonzero 
blocks. Since the analysis that will follow will clearly be irrelevant with respect to what particular location 
of nonzero blocks are chosen, we can for the simplicity of the exposition and without loss of generality 
assume that the blocks Xi, X2, . . . , X n _£ of x are equal to zero (i.e. they are zero blocks). Under this 
assumption we have the following corollary of Theorem [TJ 

Corollary 1 (Nonzero part of x has a fixed location). Assume that a dm x dn measurement matrix A is 
given. Let x be a k-block- sparse vector. Also let Xi = X2 = • • • = X n _fc = 0. Further, assume that 
y = j4x and that w is a dn x 1 vector. Then (O will produce the solution o/dij if 

n n—k 

(Vw G R dn \A-w = 0) ^2 \\Wi\\ 2 <^2\\Wi\\ 2 . (70) 

i=n— k+1 i=l 



Following the procedure of Subsection 14. 11 we set S sec 

n n—k 

S sec = {w £ S^- 1 ] l|Wi|| 2 <^||W i || 2 } (71) 

i=n— k+1 i=l 

and 

w{S sec ) = E sup (h T w) (72) 

where as earlier h is a random column vector in R dn with i.i.d. J\f(0, 1) components and S' 171 ^ 1 is the unit 
dn-dimensional sphere. As in Subsection !4.1l our goal will be to compute an upper bound on w(S sec ) and 
then equal that upper bound to (^Jdm — 4 J^ j ■ In the following subsections we present a way to get such 
an upper bound. As earlier, we set w(h, S sec ) = max we 5 scc (h T w). Following the strategy of the previous 
sections in S ubsection 14 . 2 . 1 1 we determine an upper bound B sec on w(h, S sec ). In Subsection 14.2.21 we will 
compute an upper bound on E{B sec ). That quantity will be an upper bound on w(S sec ) since according to 
the following E(B sec ) is an upper bound on w(S sec ) 

w(S sec ) = Ew(h, S sec ) = E{ max (h T w)) < E(B sec ). (73) 
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4.2.1 Upper-bounding w(h, S sec ) 

The following sequence of equalities is analogous to (O 

n n 

w(h,S sec ) = max (h T w) = max V^hjWi| = max y^|Hj|| 2 ||Wj|| 2 . (74) 

1=1 1=1 

Let H^~^ = ( || Mi Ms, ||H 2 || 2 , ... , ||H n _ fc || 2 ). Further, let |H^^| (i ) be the i-th smallest of the ele- 
ments of l&~ k) v Set 

(norm) 

ii _ (\Ti(n-k) I |xr(n-fe) I |tt(«-*0 | iitt ii iitt ii iitt ii \T / 7 n 

M - ^(normVO-)' l M (norm) I (2) ' • ■ ■ ' I H („ wm ) l(n-k) > \\t±n-k+l\\2, ||M n _ fc+2 ||2, • • • , \\ti n \\ 2 ) . (O) 

If w E S'sec then a vector obtained by permuting the blocks of w in any possible way is also in S sec . Then 
(1741 can be rewritten as 

n 

w(h,S sec )= max V"Hj||Wi|| 2 (76) 

1=1 

where Hj is the i-th element of vector H. Let y = (yi, y 2 , . . . , y n ) T E i? n . Then one can simplify (l76l ) in 
the following way 

n 

iu(h, S sec ) = max ^ H^y* 
ye i=i 
subject to yj > 0, < i < n 

n n—k 

i=n—k+l i=l 
n 

£yi<i- (77) 

j=i 

One can then proceed in a fashion similar to the one from Subsection [47T7T] and compute an upper bound 
based on duality. The only difference is that we now have H instead of H. After repeating literally every 
step of the derivation from S ubsection 14. 1 . 1 l one obtains the following analogue to the equation (f30l > 



w(h,S aec ) < 



\2 



J- h| ((H r z)-E-=iHi) 2 



n — c 

l=C+l 



Vh^ Vh 2 (( hTz ) ~ Ta=i h » 
^ i=i i=i \ 

(78) 

where c < (n — k) is such that ((H r z) — ^i=i Hi) — 0- As eaf lier, as long as (H T z) > there will be 
a c (it is possible that c = 0) such that quantity on the most right hand side of (|78T > is an upper bound on 

w(h, S sec ). 
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Using (|78T > we then establish the following analogue to LemmaQ] 

Lemma 4. Let h G R n be a vector with i.i.d. zero-mean unit variance gaussian components. Further let H 
be as defined in rt75D and w(h, S sec ) = max we 5 see (h T w) where S sec is as defined in t \71\) . Let z G R n be a 
column vector such that Zj = 1, 1 < i < (n — k) and Zj = —1, n — k + 1 < i < n. Then 

w(h, S sec ) < B sec (79) 



where 

'Yh=i H? if C sec (h,c sec ) < 



Bsec 



^ V l^i=c S ec+l n i n-csec l J Wl n > t- sec j ;> U 

£ sec (h, c) = n - c i=1 — - — H c and c sec = 5 sec n is a c < n — k such that 



(80) 



F ^{^t)=°- (81) 

F^(-) is the inverse cdfofthe chi random variable with d degrees of freedom, e > is an arbitrarily small 
constant independent of n. 

Proof. Follows directly from the derivation before LemmaQ] □ 
4.2.2 Computing an upper bound on E(B sec ) 

Following step-by-step the derivation of Lemma[3](with a trivial adjustment in finding Lipschitz constant a) 
we can establish the sectional threshold analogue to it. 

i—i _E(H T z)— V c i ec Hi) 
Lemma 5. Assume the setup ofLemma\4\ Let further ^ sec = — —.Then 



E{B sec ) < ( exp { + exp (_f!^ \ ) + 



E v h 2 (m T z)-EE-Li^) 2 

2(1 + e) J ' { 2 \) ' \ .J^ 1 n - c sec 

(82) 



Proof. Follows directly from the derivation before Lemma[3] □ 
Similarly to (|50l , if n is large, for a fixed a one can determine j3 sec as a maximum j3 such that 

ad > fE^H? _ ( g (flT.) Egg;^ 
n n(n - c sec ) 
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In the rest of this subsection we show how the left hand side of (1831 ) can be computed for a randomly chosen 
fixed p sec- We again, as earlier, do so in two steps: 



1. We first determine c 



sec 



2. We then compute lining f g ^W..+i H ? _ (£(H T z) EE^yH,) 2 \ wkh found jn st 1 

\ TlyTL C S ec) I 

Step 1: 

From Lemma|4]we have c sec = 5 sec n is a c such that 

(1 " e)E((Y^-f- n Hj - ELn-zWM-l - EiS™ Hfc) / (1 + e)c 



n-c Xd V^-ZW) 

(1 ~ e)(E ELf H, - g gU-^n+l Wig ~ g E^f gO _ j / (1 + e)c \ = 

n-c * d U(l-Aec)/ 

(84) 

where as earlier Hj = IH/^J^J^), 1 < i < (n — j3 sec n), is the i-th smallest magnitude of blocks Hj, 1 < 
1 < 1 : (n — j3 sec n). We also recall that ||Hj||2,n — /3secn + 1 < « < n, are the magnitudes of the last 
f3 sec n blocks of vector h (these magnitudes of last j3 sec n blocks of vector h are not sorted). As earlier, all 
components of h are i.i.d. zero-mean unit variance Gaussian random variables and e > is an arbitrarily 
small constant. Then since ||Hj||2 is a chi-distributed random variable with d degrees of freedom we clearly 
have £?||Hi|| 2 = , n - f3 sec n + 1 < i < n. Then from dH 



n-c Xd \n(l-/3 sec ) 

n(l-5 sec ) Xd \n(l-(3 sec )J 



Set 6 sec = 1 — 5 sec . Following the derivation of (1571) we have 

E^ { t e t)n + ^ V2T(^) f _ 1( i-e sec d d+1 



Similarly to ( 1591) we easily determine 



„_1 / (l + e)(l-g sec ) \ n .,-i r (i + 6)(i-e aec ) d 

F - I J = V 7i " c( 1-Aec ' 2 } (87) 
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Combination of (l84l >. (l85l) . (l86l >. and (l87l ) gives us the following equation for computing # sec 

{ ' ft \/ 2 7i„ c ( 7—5 ,p = °- 

Csec y 1 Msec * 

(88) 

Let # sec be the solution of (l88l) . Then <5 sec = 1 — # sec and c sec = 5 sec n = (1 — 6 sec )n. This concludes step 
1. 

Step 2: 

In this step we compute lim™ ^ E ~ +lH? - ^^f^f "'^ with c sec = (1 - sec )n. 
Using results from step 1 we easily find 



lim 



(£'(H T z) — EYy-i Hj) 2 V J r(f) 1 7tncnWAi-/3 gec > 2^ 2 ^ f(|) Pi 



2 

'si I 



n(n - c sec ) 9 sec 

(89) 

>2 



What is left to compute is lim^oo ' =c ^ ec+1 — '-. We first observe 

PV" xj2 pr (l-ft e c)nTT2 B r« TJ2 ^y-C 1 -^)™ JJ 2 

f; Z^=c see +1 M » = ^Z^=e 3ee +1 " + f Z^i=(l-/3 sec )n+l = ^i=(l-e ace )n+l 1 + a ^ 

re n n re " ' 

Following the derivation of (|67T ) we also have 

^Etg-7lV+i H i 2T(^)/ d + 2 \ 

™ nd-^) = "T(|r I " 7mc(7 » c( WW 2 } ' J ' (91) 

Combining (|90l > and d9TT > we find 



, ^E^-c +1H 2 , „ 2T(^) / / ^1-0^^(2 + 2 



We summarize the results from this section in the following theorem. 

Theorem 4. (Sectional threshold) Let A be a dm x dn measurement matrix in ([7]) with the null-space 
uniformly distributed in the Grassmanian. Let the unknown x in ([7]) be k-block-sparse with the length of 
its blocks d. Further, let the location of nonzero blocks ofx be arbitrarily chosen but fixed. Let k, m, n be 
large and let a = ^ and f3 sec = — be constants independent ofm and n. Let 7i nc ("> ■) an d 7^c( - ' ^ e ^ e 
incomplete gamma function and its inverse, respectively. Further, let e > be an arbitrarily small constant 
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and 6 sec , ( (3 sec < 6 sec < 1) be the solution of 

v^Ji)/ _ i ( i- gaec d) d+1) \ V2T(i±l) 



sec 



l 1 " 6 ) a A 2 7 inc ( 1 5 >«) = °- 



1 %eC '2' 

(93) 



Tfa an d (5 sec further satisfy 



ad > (1 - (3 sec ) ^ II - 7in C (7, nc ( 1 _^ ee » "2" ) ' + 



V 1 Psec) r ^ linc\!inc\l-p se! ,J 2/> 2 // f(¥) ^ 



2 

'sec 



(94) 



f/jerc solutions of (0) ant/ (0) coincide with overwhelming probability. 

Proof Follows from the previous discussion combining ©, (|73l . (1791 . (|82~1) . (l83l . (l88l) . (l89l) . and ^92j». □ 

The results for the sectional threshold obtained from the above theorem for different block-lengths d are 
presented on Figure [4] We also show on Figure [4] the results from [78, 81] when d — > oo. (These results 
were derived for the strong threshold; however, any lower bound on the strong threshold is automatically a 
lower bound on the sectional threshold as well.) In the following section we will explicitly show how the 
results shown on Figure |4]for d — > oo follow from the derivation given above. 

4.3 Weak threshold 

In this subsection we determine the weak threshold f3 w . Before proceeding further we again quickly re- 
call on the definition of the weak threshold. Namely, for a given a, (3 W is the maximum value of f3 such 
that the solutions of (Q} and ([3]) coincide for any /3re-block-sparse x with a given fixed location of non- 
zero blocks and given fixed directions of non-zero block vectors Xj. As in Subsection 14.21 we can for 
the simplicity of the exposition and without loss of generality assume that the blocks Xi, X2, . . . , X n _fc 
of x are equal to zero and that that vectors X n _fc +1 , X n __fc +2 , . . . ,X n have fixed directions. Further- 
more, since all probability distributions of interest will be rotationally invariant we will later assume that 
Xj = (||Xj||2, 0, 0, . . . , 0),n — k + 1 < i < n. We first have the following corollary of TheoremQ] 

Corollary 2. (Nonzero blocks ofx have fixed directions and location) Assume that a dm x dn measurement 
matrix A is given. Let ~x.be a k-block-sparse vector. Also let Xi = X2 = • • • = X n _£, = 0. Let the 
directions of vectors X n _fc + i, X n _fc +2 , . . . ,X n be fixed. Further, assume that y = Ax and that w is a 
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Block-sparse sectional thresholds as a function of block length d 




Figure 4: Block-sparse sectional thresholds as a function of block length d, £2/(1 -optimization 



dn x 1 vector. Then (0) will produce the solution offtj} if 



(Vw G R dn \Aw = 0) 



E H<E»w,ii, 

i=n— fc+1 1=1 



(95) 



Proof. The proof closely follows the proof of Theorem Q] given in [83]. Let x be the solution of £T|) and 
let x be the solution of (J3). Also, assume x 7^ x, i.e. assume Y17=i ll-^lk < Y^i=i ll^lb where Xj = 
(x (i _ 1)d+1 ,x (i _ 1)(i+ 2, . . . ,x id ) T and Xj = (x (i _ 1)d+1 , x (i _ 1)d+2 , . . . ,x id ) T , for i = 1,2,..., n. Then we 
can write 



n—k 

Xi||2 = ^2 ||Xj — Xj + Xj||2 = ^ ||Xj — Xj + Xi||2 + ^ ||X — Xj + Xj||2 
i=l i=n— fc+1 i=l 



n n—k n n—k 

= y iiWj+x i || 2 + y; 1 1 Wiii 2 > y \\\%h + ; ,,J r ,, iiwy + y nw € || 2 

i=n— fc+1 «=1 i=n— fc+1 «=1 

n n -^.ji n— fc n n ^.j, n—k 

>- E 11** + E !^ + Eiiw,ih = £»**+ £ ^r+Eiwb. 

j=n-fc+l i=n-fc+l " 4 " 2 i=l i=l i=n-fc+l " *" 2 i=l 

(96) 
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If (|95]> holds then from d%]> Yh=i 1 1 ^ 1 1 2 > E"=i 1 1 x » 1 1 2 which contradicts the assumption Yh=i I \%, I b < 
Ym=i I b- Therefore, x = x. This concludes the proof. □ 



Following the procedure of Subsection 14.21 we set 



S' w = {w e S^l - wir<Eli w ^} (97) 

i=n— fc+1 i=l 



and 



w{S'J=E sup (h T w) (98) 

where as earlier h is a random column vector in R dn with i.i.d. J\f(0, 1) components and S^™ -1 is the unit 
(in-dimensional sphere. Let 0j be the orthogonal matrices such that X^0j = ( [| [| 2 , 0, . . . , 0), n — k+ 1 < 
i < n. Set 

n n—k 

s w = {wes dn - i \ - Yl w (-i)d+i < E ii w *ii 2 > < 99) 

i=n— fc+l i=l 



and 



w(S w ) = E sup (h T w). (100) 

w£S w 



since Hf and Hf Gj have the same distribution we have w(S w ) = w(S' w ). As in Subsections 14. 1 1 and 14.21 
our goal will then again be to compute an upper bound on w(S„,) and subsequently equal that upper bound 
to 'dm — — Following the strategy of the previous sections in Subsection 14.3. II we will determine 
an upper bound B w on w(ii, S w ). In Subsection 14.3.21 we will compute an upper bound on E(B W ^). That 
quantity will be an upper bound on w(S w ) since according to the following E(B W ) is an upper bound on 
w(S w ) 

w(S w ) = Ew(h,S w ) = £(max(h T w)) < E{B W ). (101) 

we s w 

4.3.1 Upper-bounding w(h, S w ) 

Let H* = (h {i _ 1)d+2 ,h {i _ 1)d+3 ,...,h id ) T , Wf = (w (i _ 1)d+2 , w (i _ 1)d+3 , . . . , w id ) T , i = n-k + 
1,2, ... ,n. One then writes in a way analogous to © 

n n n—k 

w{h,S w ) = max(h T w) = max ( V h (i _ 1)d+1 w (i _ 1)d+1 + V ||H||| 2 ||W*||2+Y" ||Hj|| 2 ||Wj||2) 

i=n— z=n— /c+l 2=1 

(102) 
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We recall one more time that H, , = (||Hi No, IIH2II2, • • • , HHL-fclU) and that |H[ n Jr,-\ is the i-th 

(norm) vl1 J-ii^'H ^m^> in n n-n^/ i {norm) 1 \ l ) 

smallest of the elements of H^ n k \. Set 

(norm) 

-Tr t\Xj( n ~ fc) I |Tr( n— ^) I l"H"( n— *0 I V. V, 1-, 

rt _ U^^ormjKl)) I "(norm) I (2)' • • ' I" (norm) \(n-k)> - n (n-k+l)d+l, - n (n-fc+2)d+l > • • • > -n(n-l)d+l> 

||H*_ fe+1 || 2 , ||H*_ fc+2 || 2 , • • • , ||H* || 2 ) T . (103) 
Let y = (yi, y 2 , • • • , y n +k) T S R n+k . Then one can simplify d 102b in the following way 

w(h, S w ) = max V" H^y, 

y(Z R n+k ^ 

subject to fi > 0,0 < i < n — k,n + 1 < i < n + k 

n n—k 

E y< > E y< 

i=n— fc+1 i=l 

E^ 1 ^ 104 ) 

where Hj is the i-th element of H. Let z G i? n+fc be a vector such that Zj = 1, 1 < i < n — k, Zj = 
— fc + l<i<n, and Zj = 0, n + l<i<n + /c. One can then proceed in a fashion similar to the one 
from S ubsection 14. 1 . 1 1 and compute an upper bound based on duality. However, there will be two important 
differences. First, we now have H instead of H. Second we have z instead of z. One should, however, note 
that ||z|| 2 = ||z|| 2 . After repeating literally every step of the derivation from Subsection 14. 1 . 1 1 one obtains 
the following analogue to equation (|30l 



w(h,S w ) < 



' _ 2 ((H^z)-E- = iH,) 2 



r2 ((H^z)-ELiH, 



E*? 



n — c 

l=C+l 



i ^ 1 n-c \ 

\ i=i i=i \ 

(105) 

where c < (n — k) is such that ((H T z) — J2i=i Hi) > 0. As earlier, as long as (H T z) > there will be 
a c (it is possible that c = 0) such that quantity on the most right hand side of (11051 ) is an upper bound on 

w{h,S w ). 

Using (11051 ) we then establish the following analogue to Lemmas Q] and [4] 

Lemma 6. Let h G R dn be a vector with i.i.d. zero-mean unit variance gaussian components. Further let 
H be as defined in Iil03\l and w(h, S w ) = maxwgg^ (h T w) where S w is as defined in Ii99\l . Let z E R n+k 
be a vector such that Z{ = 1,1 < i < n — k, Zj = — 1, n — k + 1 <i <n, and Zj = 0, n+l<i<n+k. 
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Then 



w(h, S w ) < B 



(106) 



where 



En+k 1 1 2 
i=l "-i 



if ( w (h,c w ) < 



y'T^ it 



C«j(h, c) = _ i=1 — - — H c and c w = 5 w n is a c < n — k such that 



(l-6)£((H r z)-n=iH 4 ) F -if(l + e 



n — c 



Xd 



n — k 



0. 



(107) 



(108) 



Fy d (•) is the inverse cdf of the chi random variable with d degrees of freedom, e > is an arbitrarily small 
constant independent of n. 



Proof. Follows directly from the derivation before LemmaQ] 



□ 



4.3.2 Computing an upper bound on E(B W ) 

Following step-by-step the derivation of Lemma[3](with a trivial adjustment in finding Lipschitz constant a) 
we can establish the weak threshold analogue to it. 



Lemma 7. Assume the setup of Lemma® Let further 



£(H r z)-E^! HQ 
n 



. Then 



E(B W ) <y/n(exp(- 



ne 5 W 
'2(1 + e) 



+ exp 



+ 



n+k 
i=c w +l 



(£(BFz)-£^ =1 H„ 



n — c„ 



(109) 



Proof. Follows directly from the derivation before Lemma[3] □ 
Similarly to (l50l and (l83l . if n is large, for a fixed a one can determine (3 W as a maximum (3 such that 



E E7-c +1 fi2 (£(H T z) - E 1 V^i ft) 2 

n n(n — c w ) 

In the rest of this subsection we show how the left hand side of (II 101) can be computed for a randomly 
chosen fixed [3 W . We again, as earlier, do so in two steps: 

1. We first determine c,„ 
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2. We then compute lim^oo I ^=an±i ! - 1 ( n (n-£o ) with Cui found in step L 

>Ste/;> 1: 

From Lemma[6]we have c w = 5 w n is a c such that 



(i - e )£((£lLf mn Hj - £IL+Vn + i Hi) - Efei Hi) f (i + e ) c 



n — c Xd \ n (l ~~ Ao) 



n — c 

(l-€)(£?ES ft8n Hi + ^E 



Xrf 



! f (l + e)c 
n(l - 



i=n-/3 ro n+l n (i-l)d+l ~ ^ Et=l H „_! / (1 + e)c 



n — c 

^ (i-g)(^Er=i^ n H t -jj;E-=rH t ) ! 

w r. 



n — c 



Kd 



n — c 



(l + e)c 
* d Vn(l-/3 to ) 



n(l - 

: 



(111) 



Set = 1 — Then combining (1 1 1 1 b and (l86l ) we obtain the following equation for computing 8. 



(1 - «)(! - A.) rli> t 1 -^'-^ ^ _ J ^ + W-U , t) - 0. (U2) 



Let m be the solution of (II 12b . Then S w = 1 — # w and c w = ft^n = (1 — m )n. This concludes step 1. 
Step 2: 

In this step we compute lim,™ f fSL+lgf _ M^ggpHil! ^ wit h c w = (1 - 0^)n. Using 
results from step 1 we easily find 



(E(H T z) - £ Yl, Cw i Hi) 2 1 r ^ ' l « c wincU-^> 2/> 2 ^ 

n(n-Cu,) § 
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Effectively, what is left to compute is — L . We first observe 

rp\^n+k fr2 17 v^(1-/3uj)" tt2 R V n "H" 2 r r^n+fcn ii2 

f; 2^=^+1 H i _ & 2^i=c w +l H f; 2^=(l-/^)n+l ^ ^ 2^=n+l " 

n n n n 

E y{l-P w )n g 2 F ^n ,2 „ n+ft „ n m *||2 

^i=(l-e^)n+l 1 ^ l^i=(l-0 w )n+l n (i-l)d+l ^Ljj= n +1 lb 

n n n 

p\p(l~Pw)n tt2 

_ ^i=(l-9 w )n+i » /3 w n /3u.n(d - 1) 



n n n 



py>(l-A»)n tt2 
^i=(l-0 tt )n+l * 



+ A»<£ (114) 



Combining (|114l) and (I9TI ) we find 

n l^m = (1 - " 7^(7^(3^, 5), — ) j + A»A (H5) 

We summarize the results from this section in the following theorem. 

Theorem 5. (Weak threshold) Let Abe a dm x dn measurement matrix in (0) with the null-space uniformly 
distributed in the Grassmanian. Let the unknown x in ([7]) be k-block- sparse with the length of its blocks d. 
Further, let the location and the directions of nonzero blocks ofx be arbitrarily chosen but fixed. Let k, m, n 
be large and let a = ^ and /3 W = j t be constants independent ofm and n. Let ji nc (-, •) and 7^c(-, •) be the 
incomplete gamma function and its inverse, respectively. Further, let e > be an arbitrarily small constant 
and 9 W , (/3 W < 9 W < 1) be the solution of 

n ui r ^ r ( f) V ^MnAi^2h 2 )) i + e ) (l-9 w ) d 

(l-e)(l-Ao) 2 n \ 2 linc( ; a ' 77 ) = °" ( 116 ) 



2' 



7/ 1 a and (3 W further satisfy 



2 



(117) 



the solutions of ([7]) arcc? (0) coincide with overwhelming probability. 
Proof. Follows from the previous discussion combining ©, (11011) . (1 106b . ( 1109b . (II 101 ). (11121) . (11131 ). and 
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The results for the weak threshold obtained from the above theorem for different block-lengths d are 
presented on Figure [5] We also show on Figure [5] the results for d —> oo that we will discuss in more detail 
in the following section. 

Block-sparse weak thresholds as a function of block length d 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

a 

Figure 5: Block-sparse weak thresholds as a function of block length d, Izjli -optimization 



5 d —> oo 

When the block length is large one can simplify the conditions for finding the thresholds obtained in the 
previous section. Hence, in this section we establish attainable strong, sectional, and weak thresholds when 
d — ► oo, i.e. we establish attainable ultimate benefit of £2/^1 -optimization from © when used in block- 
sparse recovery (Q~|). Throughout this section we choose d — > 00 in order to simplify the exposition. However, 
as it will become obvious, the analogous simplified expressions can in fact be obtained for any value of d. 

5.1 d — > 00 - strong threshold 

Following the derivation of Section |4.1.1| and its connection to Theorem [3] it is not that difficult to see that 
choosing 6 S = 1 in (|69l ) would provide a valid threshold condition as well (9 S = 1 is in general not optimal 
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for a fixed value d, i.e. when d is not large a better choice for 9 S is the one given in Theorem©. The choice 
9 S = 1 gives us the following corollary of Theorem [3] 

Corollary 3. ( Strong threshold, d — > oo ) Let Abe a dm x dn measurement matrix in (0) with the null-space 
uniformly distributed in the Grassmanian. Let the unknown x in be k-block-sparse with the length of its 
blocks d — ► oo. Let k,m,n be large and let a = — and = ^ be constants independent of m and n. 
Assume that d is independent ofn. If a and f3^° satisfy 

a> 4/J a °°(l -&°°) (118) 

then the solutions of ([7]) and (0) coincide with overwhelming probability. 
Proof. Let 9 S = 1 in d69"i Then from ([69]) we have 



2r( rf±2) (^fP^ 1 " 2 ( X " W7ii(l " A, H ¥))) 
> dT(|) d 

( V2r(^) " 

= l-((l-2(l-7i^l-ft,^)))) ^ ^ ^ (H9) 

When d ^ oo we have lim^ Tir^T^ 1 - A, |), ^)) = 1-/5, and lim^ ± ( ^f^ ) = 1. 
Then from dl 191) we obtain the following condition 

a > 1 - (1 - 2(1 - (1 - &))) 2 = 4^(1 - P s ). (120) 

Since dl20| ) is exactly the same as ( 11181) this concludes the proof. □ 
The results obtained in the previous corollary precisely match those obtained in [78, 81]. 

5.2 d — > oo - sectional threshold 

Following the derivation of Section |4.1.1| and its connection to Theorem [4] it is not that difficult to see that 
choosing 9 sec = 1 in (|94l would provide a valid threshold condition as well (ag ain, 9 sec — 1 is in general 
not optimal for a fixed value d, i.e. when d is not large a better choice for 9 sec is the one given in Theorem 
H]). Choosing 9 sec = 1 in (|94l gives us the following corollary of Theorem [4] 



37 



Corollary 4. ( Sectional threshold, d — > oo) Let Abe a dm x dn measurement matrix in ([7]) with the null- 
space uniformly distributed in the Grassmanian. Let the unknown x in (0) be k-block- sparse with the length 
of its blocks d — > oo. Further, let the location of nonzero blocks of x be arbitrarily chosen but fixed. Let 
k,m,n be large and let a = ^ and /3^ c = ^ be constants independent of m and n. Assume that d is 
independent of n. If a and (3^ c satisfy 

a>4/3-(l-/3-) (121) 

then the solutions of ([7]) and (0) coincide with overwhelming probability. 
Proof. Let 9 sec = 1 in (HI). Then from d94]) we have 



a > 



J sec 



d d 

( /op/- d+1 \ \ 2 
— ^^ay — J = 1- Then from (11221 ) we easily obtain the condition 

a > 4/3 sec (l - (3 sec ) 

which is the same as the condition stated in (1121b . This therefore concludes the proof. □ 

Remark: Of course, the statement of Corollary 0] could have been deduced trivially from Corollary [3] 
Namely, any attainable value of the strong threshold is an attainable value for the sectional threshold as well. 

5.3 d — > oo - weak threshold 

Reasoning as in the two previous subsections we have that 9 W = 1 in (11171 ) would provide a valid condition 
for computing the weak threshold. In turn choosing 9 W = 1 in (11171 ) gives us the following corollary of 
Theorem [5j 

Corollary 5. (Weak threshold, d — > oo) Let Abe a dm x dn measurement matrix in (UJ) with the null-space 
uniformly distributed in the Grassmanian. Let the unknown x in (UJ be k-block-sparse with the length of its 
blocks d — > oo Further, let the location and the directions of nonzero blocks of x be arbitrarily chosen but 
fixed. Let k,m,n be large and let a = ^ and (3™ = ^ be constants independent of m and n. Assume that 
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d is independent ofn. If a and f3^ satisfy 

a>ffi(2-ffi) (123) 

then the solutions of ([7]) and (0) coincide with overwhelming probability. 
Proof Let 8 W = 1 in (fTTTl) . Then from (fTTTT ) we have 

v / 2r(4i) x 2 



a > 



(l_^)d + ^d V Pw) r ( |) 



2 



2 



(124) 



— j^ry — j =1. Then from (11241 ) we easily obtain the 

condition 

a > (3 w (2-f3 w ) 

which is the same as the condition stated in (1 1 23b - This therefore concludes the proof. □ 

The results for the strong, sectional, and weak threshold obtained in the three above corollaries are 
shown on figures in earlier sections as curves denoted by d — ► oo. 

It is interesting to note that dll9l >, (11221) . and (11241 ) can be used instead of ([69]), d94}, and dll7| ) to 
determine attainable values of the thresholds for any fixed d. Given that d 1 191 ), (11221) . and (11241 ) are obtained 
for a suboptimal choice of 9 the threshold values that they produce trail those presented on Figures [3j SI and 
[5] and we therefore do not include them in this paper. However, we do mention that they are relatively easier 
to compute and a fairly good approximation of the results presented on Figures [3HH and [5] 



6 Numerical experiments 

In this section we briefly discuss the results that we obtained from numerical experiments. In all our nu- 
merical experiments we fixed n = 100 and d = 15. We then generated matrices A of size dm x dn with 
m = (10, 20, 30, . . . , 90, 99). The components of the measurement matrices A were generated as i.i.d. 
zero-mean unit variance Gaussian random variables. For each m we generated fc-block-sparse signals x for 
several different values of k from the transition zone (the locations of non-zero blocks of x were chosen 
randomly). For each combination (k,m) we generated 100 different problem instances and recorded the 
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Table 1: The simulation results for recovery of block-sparse signals; n = 100, d = 15 



m 


10 


20 


30 


40 


50 


60 


70 


80 


90 


99 


k 1 # of errors 


7/100 


12/100 


18/100 


22/76 


29/80 


37/94 


46/95 


57/98 


71/97 


92/89 


k 1 # of errors 


6/100 


11/98 


17/100 


22/76 


29/80 


36/64 


45/71 


55/60 


69/70 


90/52 


fc / # of errors 


5/95 


10/93 


16/89 


21/39 


28/43 


35/26 


44/38 


53/11 


67/27 


89/27 


k / #of errors 


4/14 


9/21 


15/36 


20/5 


27/11 


34/6 


43/11 


52/2 


66/11 


88/12 


fc / # of errors 


3/0 


8/0 


14/8 


19/0 


25/0 


32/0 


42/6 


50/0 


65/6 


87/3 



number of times £2/^1 -optimization algorithm from (0) failed to recover the correct /c-block-sparse x. All 
different (k,m) combinations as well as the corresponding numbers of failed experiments are given in Table 
Q] The interpolated data from Table[T]are presented graphically on Figure[6] The color of any point on Figure 
|6]shows the probability of having £2/^1 -optimization succeed for a combination (a, (3) that corresponds to 
that point. The colors are mapped to probabilities according to the scale on the right hand side of the figure. 
The simulated results can naturally be compared to the weak threshold theoretical prediction. Hence, we 
also show on Figure [6] the theoretical value for the weak threshold calculated according to Theorem [5] (and 
shown on Figure [5]>. We observe that the simulation results are in a good agreement with the theoretical 
calculation. 

Experimentally recoverable block-sparsity, n=100, d=15 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 

a 

Figure 6: Experimentally recoverable block-sparsity, ^2/^1 -optimization 
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7 Discussion 



In this paper we considered recovery of block-sparse signals from a reduced number of linear measurements. 
We provided a theoretical performance analysis of a polynomial £2/^1 -optimization algorithm. Under the 
assumption that the measurement matrix A has a basis of the null-space distributed uniformly in the Grass- 
manian, we derived lower bounds on the values of the recoverable strong, sectional, and weak thresholds in 
the so-called linear regime, i.e. in the regime when the recoverable sparsity is proportional to the length of 
the unknown vector. We also conducted the numerical experiments and observed a solid agreement between 
the simulated and the theoretical weak threshold. 

The main subject of this paper was the recovery of the so-called ideally block-sparse signals. However, 
the presented analysis framework admits various generalizations. Namely, it can be extended to include 
computations of threshold values for recovery of approximately block-sparse signals as well as those with 
noisy measurements. Also, in this paper we were mostly concerned with the success of £2/^1 -optimization. 
However, as we have mentioned earlier instead of £2/^1 -optimization one could use an £2 /^-optimization 
(0 < q < 1). While the resulting problem would not be convex it could still be solved (not necessarily in 
polynomial time) with various techniques from the literature. One could then potentially find an interest 
in generalizing the results of the present paper to the case of £2 /£ q -optimization (0 < q < 1) as well. On 
a completely different note, carefully following our exposition one could spot that the results presented in 
this paper assume large dimensions of the system. Obtaining their equivalents for systems of moderate 
dimensions is another possible generalization. All these generalizations will be part of a future work. 

We would like to reemphasize that our analysis heavily relied on a particular probability distribution of 
the null-space of the measurement matrix. On the other hand our extensive numerical experiments (results of 
some of them are presented in [83]) indicate that £2/^1 -optimization works equally well for many different 
statistical measurement matrices A (e.g. Bernoulli). It will be interesting to see if the analysis presented here 
can be generalized to these cases as well. Furthermore, as in [33], one can raise the question of identifying 
class of statistical matrices for which £2/^1 -optimization works as well as in the case presented in this paper. 
However, we do believe that answering this question is not an easy task. 

As far as the technical contribution goes, we should mention that our analysis made a critical use of 
an excellent work [47] which on the other hand massively relied on phenomenal results [20, 67] related 
to the estimates of the normal tail distributions of Lipshitz functions. In a very recent work related to the 
matrix-rank optimization the authors in [69] successfully conducted a theoretical analysis applying results 
of [20,67] without relying on the conclusions of [47]. It will certainly be interesting to see what performance 
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guarantees the direct application of the results of [20, 67] would produce in the problems considered in this 
paper. 

Lastly, it is relatively easy to note that the signal structure imposed in this paper is very simple, i.e. 
almost ideal. For example, we assumed that all blocks are of the same length. Just slightly modifying 
that assumption so that the blocks are not of equal length significantly complicates the problem. It will be 
interesting to see if algorithms similar to £2/^1 -optimization can be used for signals with these (or possibly 
even some completely different) structures and if an analysis similar to the one presented in this paper can 
be developed for them as well. 
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